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Abstract 

A high number of discrete optimization problems, including Vertex 
Cover, Set Cover or Feedback Vertex Set, can be unified into the class of 
covering problems. Several of them were shown to be inapproximable by 
deterministic algorithms. This article proposes a new random approach, 
called Choose Outsiders First, which consists in selecting randomly ele- 
ments which are excluded from the cover. We show that this approach 
leads to random outputs which mean size is at most twice the optimal 
solution. 

In his landmark paper in complexity theory R. Karp provides a list of 
21 NP-complete problems from which most of the NP-completness results are 
deduced. Among them are the extensively studied Vertex Cover, Set Cover, 
Feedback Vertex (or Arc) Set or Hitting Set problems, which belong to the class 
of covering problems. Covering problems ask how large a certain combinatorial 
structure has to be to cover another one, and have a wide range of applications in 
all areas involving combinatorial optimization problems, including VLSI systems 
[10] , routing [6] or scheduling [7 . In the last decades, they also became central 
in computational biology |12) as parsimony is often considered as the choice 
criteria between the different evolutional scenarios explaining the observations 

Most of the covering problems are NP-complete, so that they need to be 
solved by using heuristics. The proposed algorithms can mainly be classified 
into two families. The firt one consists in the primal-dual approaches which are 
based on the formulation of covering problems as integer linear programming 
problems [T3] . The second type of approximation algorithms are based on local 
ratio techniques which consist in solving a problem locally and extending the 
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solution [5J [3] . A common measure of the quality of those heuristics is their 
approximation factor. The litterature about approximation results for cover- 
ing problems is huge, and an overview can be found in pQ. The main covering 
problems listed above were shown to be APX-hard. The Set Cover is even 
not approximable better then within a logarithmic factor, whereas the constant 
approximability of Hitting Set and Directed Feedback Vertex (or Arc) Set prob- 
lems are still open questions. The best known solutions for Vertex Cover and 
Undirected Feedback Vertex Set have an approximation ratio of 2. 

One way to reach better approximation results is the use of random algo- 
rithms and the study of the mean approximation ratio of the outputs. A random 
local ratio approach proposed in [3] yields for instance a mean approximation 
of 2 for the Vertex Cover problem and of the maximum size of the sets for the 
Set Cover and Hitting Set problems. 

In this paper, we propose a new random algorithm for covering problems. 
Its main difference with already studied heuristics is that the aim is not to select 
good candidates for the cover but to exclude randomly elements from the cover. 
This corresponds to assign a random order to the elements and to consider them 
in increasing order. An element is then added to the cover if and only if has to 
be added in order not to miss a structure which has to be covered. This idea was 
introduced in the case of the unweighted Vertex Cover in [5] and was proved to 
yield a mean 2-approximation for this particular covering problem [5] . We show 
that this approach, that we call Choose Outsiders First, is in fact much more 
general in the sense that it can be applied and yields a mean approximation ratio 
of 2 for any covering problem. This is to our knowledge the first approximation 
result for which the ratio is independent from the input for problems like the 
Set Cover or the Directed Feedback Vertex Set. 

1 The algorithm 

Following Bar- Yehuda's [3] formalism, an unweighted covering problem is a pair 
(U, / : 2 U — > {0, : U — » K + ) where U is a finite set, / is monotone, i.e., 
AC B ^ f(A) < f(B), and f(U) = 1. For a set C C U, u)(C) = T, x€ c ui ( x ) 
is called the weight of C. A set C C U is a cover if /(C) = 1. The problem is 
then to find a cover of minimum weight, that is a set C* C U such that 

uj(C*) = min(w(<7) : C C U and /(C) = 1) 

To do so, we consider the algorithm Choose Outsiders First which relies on 
the idea that if the optimal cover is small, a randomly chosen vertex has a high 
probability not to be contained in the optimal solution. Therefore, two sets 
OUT and IN are considered and at each step, a vertex is randomly chosen and 
is put into OUT, that is considered to be not in the cover. However, from time 
to time, a structure which has to be covered has seen all its elements but one 
put into OUT. This last element has then to be put into the cover and is added 
in the IN set. Once all the elements of U have been classified into OUT or IN, 
the set IN is a cover and is output by the algorithm. 
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The pseudo-code of Choose Outsiders First is given in Algorithm [T] At each 
step of the algorithm, we say that a vertex is available if it hasn't be classified 
yet and denote by A the set available vertices, that is A = U\{OUT\JlN}. The 
pseudo-code of Algorithm [T] is written by using A, IN and OUT at each step 
for better readability but in practice, the algorithm can be written by updating 
only A and IN or OUT and IN, the union of the three sets beeing always U. 
Note that if the conditions of Line [2] are checked in polynomial time, which is 
the case if the problem is in NP, the total running time is polynomial. 

The probability distribution used to choose the excluded vertex at each step 
is the one proportional to the weights of the available vertices. Elements of 
small weight are therefore excluded with lower probability and thus favored to 
be in the output. Note that in the case of an unweighted covering problem, the 
algorithms picks uniformly the excluded vertex. 



Algorithm 1: Choose Outsiders First 



Pick randomly u £ A with probability ; 



1 IN — 0, OUT = 0, A = U 

2 while A ^ do 

3 

OUT = OUT U {u} ; 

for v G U \ {IN U OUT} such that f(U \ {OUT U {v}}) = do 
| IN = IN U {v} 
end 

A = U\{OUTU IN} ; 
9 end 



The size of the output cover is a random variable, which we call Cover Size. 
To assess the efficiency of the algorithm, we have to rely the values of Cover Size 
to the size of an optimal solution. Let us first show that this value is equal to 
min [Cover Size). 

Theorem 1. Any optimal cover C* has a non-null probability to be output by 
Choose Outsiders First. Hence, the optimal size of a cover is min(C 'over Size) . 

Proof. Let C* be an optimal cover. Consider a run of the algorithm such that, 
if possible, the random picked vertex is always chosen in U\C*. Let us show 
by induction that at each step, OUT H C* = and IN C C* . Note that it is 
trivially true at the beginning of the algorithm. 

Suppose now it is true at some point just before a random vertex is picked 
and suppose that no vertex in U \ C* is available. Then A C C* , IN C C* and 
OUT n C* = 0, that is U\ OUT = C*. But if there is a vertex v in A, it has 
not been put into IN in the previous round, which means that the condition 
at Line [5] was not satisfied. Hence, U \ {OUT U {v}} = C* \ {v} is a cover, 
which contradicts the minimality of C* . Consequently, a vertex of U \ C* has 
to be available and it is such a vertex which is chosen. Thus the two desired set 
relations are still valid after Line|4] 
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Figure 1: Consider the Feedback Arc Set problem on the graph G on the left, 
that is finding a set of arcs of minimal weight hitting all the cycles of G. Assume 
that b is picked first, followed by g, c (d is then added to IN) and h. At this 
point, we have OUT — {b, c,g, h} and IN — {d}. The right part of the figure 
shows the resulting incompatibility graph, where two edges of G are linked if 
adding them both to OUT creates a cycle containing no edge in IN. 

Suppose now that they are valid after Line [4] and let v an element which is 
added to IN at Line [6] Then v satisfied the condition on Line |5j which means 
that U \ {OUT U {v}} is not a cover. But if v <£ C* , C* C U \ {OUT U {«}}, 
which would be a contradiction with the monotonicity of the covering property. 
Thus, only vertices of C* are added to IN, so that the set relations remain true 
after Line0 

□ 

2 Analysis of the mean approximation ratio 

The key structure for the analysis of the RANDOM COVER algorithm is a 
graph encoding the fact that the choice of a vertex to put into OUT may force 
some others to go into IN: consider two sets OUT and IN generated by the 
algorithm as they are on the beginning of a run of the loop at Line [5] . We define 
the incompatibility graph Gout. in as follows: 

• V(Goutjn) = A 

• 0, v) is an edge of Goutjn if f(U \ {OUT U {u, v}}) = 0. 

Gout, in represents the set of elements of U which still have to be classified 
and two of them are linked by an edge if they are incompatible, that is both of 
them cannot be added simultaneously to OUT as putting all other elements in 
IN would not lead to a solution of the covering. Note that the incompatibility 
graph changes when the sets OUT and IN are updated. Moreover, if u denotes 
the vertex put into OUT at Line|4j the set of vertices put into IN at Line [6] is 
exactly the neighborhood N(u) of u in Goutjn- 

An example of incompatibility graph is shown in Figure [T] in the context of 
a Feedback Arc Set problem. 
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Consider again any pair {OUT, IN) of sets generated by the algorithm. Let 
Xoutjn be the random variable counting the weight of the elements of U 
which will be added in the future to IN. The weight of the elements already 
in IN is not counted here. In particular, X out, in = if OUT U IN = U and 
Cover Size — X^^. 

Proposition [T] can easily be adapted to show that the minimum weight of 
the vertices to add to IN in order to obtain a cover containing all the vertices 
of IN and none of OUT is mm(XouTjN)- 

Lemma 2. Let Gout, in be an incompatibility graph and S the vertices cor- 
responding to a minimum solution, that is such that IN U S is a cover and 
oj(S) = min(XouT,iN)- For any vertex u of G, denote by N s {u) the set of its 
neighbors in S . Then: 

1. the set H of the vertices which are not in S is an independent set. 

2. for every vertex u, 

mm(X uTu{v.},iNu{N(u)}) < i™(X ut,in) ~ ^2 w(v) 

v£N s (u) 

Proof. 1. Suppose that an edge links to vertices u and v of H. It means 
that U \ {OUT U {u, v}} is not a cover, which, together with the fact that 
IN U S is a cover, contradicts the monotonicity of the covering property. 

2. When u is added to OUT, the whole neighborhood of u is added to IN. 
In particular, all the vertices of SC\N(u) are added to IN. Hence, starting 
from OUT = OUTU{v} and IN' = INUN(u), it is possible to complete 
IN' into a cover by adding the vertices of S\ N(u). The optimal solution 
is therefore of weight at most co(S) — J2 v eN s (u) UJ ( V )- 

□ 

Theorem 3. For all pair of sets OUT and IN that may be generated by the 
algorithm, 

E(Xoutjn) < 2mm.{XouT,iN) 
In particular, applying it for OUT = and IN = yields 

E,(CoverSize) < 2mhi(CoverSize) 

Proof. The proof is done by induction on \A\. 

If \A\ = 0, Xoutjn is constant and equal to so that the theorem trivially 
holds. 

Let's consider a pair (OUT, IN) generated by the algorithm and suppose 
that the theorem holds for every pair (OUT, IN') with OUT C OUT and 
IN C IN'. 
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To improve readability, the indices OUT and IN are omitted in the rest of 
this proof: X (resp. G) stands for Xoutjn (resp. Gout. in) and ^+ Uj +jv(«) 
for ^o(/tu{ii},/jvuw(u)- 

As in Lemma [2] S is a optimal size solution given OUT and IN and N, 
N s and N H stand for the different neighborhoods in the incompatibility graph 
Gout, in- 

E(X) = ^2 E ( X W is chosen )F(u is chosen ) 

uinA 

= — -—- uj(u)E(X\u is chosen ) 
w(A) ^ 

= ^A) 51 w ( M ) E ( I +»,+iv(«)) + E 

- ~ (T) E UJ ( u )( 2min ( x +u,+N(u)) + E w(u)) by induction 

< — ^—r Y w(u)(2(min(X) - V] w(u)) + V] w(u)) by Lemma [2] 

<2min(X) + -i-^ W ( U )(-2 £ + J2 ^) 

^ ' u£A veN s (u) vEN(u) 

<2min(X) + -i-^c(,i)( J2 ^')- E ^) 

^ ' ueA veN H (u) u£iV s (u) 

For any edge e = (tt, u) of the incompatibility graph, we define its weight 
as the product of the weight of its endvertices, that is w(e) = lj(u)u>(v). Let 
e(H, S) denote the total weight of the edges linking S to H, that is e(H, S) — 

Y^ e =(u,v),ueH,veS w ( e )- 

Then, as H is an independent set, 

5>(«)( J2 E w M)-E w W( E ^w) 

«eff neJV H ()i) i)£Af s (u) ugh veN s (u) 

= ~e(H, S) 



and 



E w ( M K E 

«es veN H (u) 



E <"(«))<££«(«)( E w M) 

vGN s {u) uGSuGS veN H (u) 

< e(H, S) 
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Thus, Equation [l] yields 



E(X) < 2min(X), 

which proves the theorem. 



□ 



Using the standard Markov Inequality, this theorem allows to obtain almost 
surely a 2 + a approximation for every positive a as stated in the following 
corollary. 

Corollary 4. Consider any covering problem in NP. For every a > and 
e > 0, there exist a polynomial time random algorithm which output is a 2 + a 
approximation with probability at least 1 — e. 

Proof. Consider one run of the Choose Ousiders First algorithm. Let X be the 
weight of the output and Opt be the weight of an optimal solution. Then 



V(X > (2 + a)Opt) < I h y Markov's inequality 

< — ^— — by Theorem [3] 



< 



(2 + a)Opt 
1 



l + a/2 
lm p the. 

minimum X* among all the outputs yields 



Thus, running the algorithm p times with p > ^1^/2) anc ^ taking the 



F(X*>(2 + a)Opt)<{— l —Y 

1 + a/ 2 



< € 



□ 
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